Goto

Collaborating Authors

 dominican republic




Learning Individual Reproductive Behavior from Aggregate Fertility Rates via Neural Posterior Estimation

arXiv.org Artificial Intelligence

Age-specific fertility rates (ASFRs) provide the most extensive record of reproductive change, but their aggregate nature obscures the individual-level behavioral mechanisms that drive fertility trends. To bridge this micro-macro divide, we introduce a likelihood-free Bayesian framework that couples a demographically interpretable, individual-level simulation model of the reproductive process with Sequential Neural Posterior Estimation (SNPE). We show that this framework successfully recovers core behavioral parameters governing contemporary fertility, including preferences for family size, reproductive timing, and contraceptive failure, using only ASFRs. The framework's effectiveness is validated on cohorts from four countries with diverse fertility regimes. Most compellingly, the model, estimated solely on aggregate data, successfully predicts out-of-sample distributions of individual-level outcomes, including age at first sex, desired family size, and birth intervals. Because our framework yields complete synthetic life histories, it significantly reduces the data requirements for building microsimulation models and enables behaviorally explicit demographic forecasts.


Democratising Artificial Intelligence for Pandemic Preparedness and Global Governance in Latin American and Caribbean Countries

arXiv.org Artificial Intelligence

Infectious diseases, transmitted directly or indirectly, are among the leading causes of epidemics and pandemics. Consequently, several open challenges exist in predicting epidemic outbreaks, detecting variants, tracing contacts, discovering new drugs, and fighting misinformation. Artificial Intelligence (AI) can provide tools to deal with these scenarios, demonstrating promising results in the fight against the COVID-19 pandemic. AI is becoming increasingly integrated into various aspects of society. However, ensuring that AI benefits are distributed equitably and that they are used responsibly is crucial. Multiple countries are creating regulations to address these concerns, but the borderless nature of AI requires global cooperation to define regulatory and guideline consensus. Considering this, The Global South AI for Pandemic & Epidemic Preparedness & Response Network (AI4PEP) has developed an initiative comprising 16 projects across 16 countries in the Global South, seeking to strengthen equitable and responsive public health systems that leverage Southern-led responsible AI solutions to improve prevention, preparedness, and response to emerging and re-emerging infectious disease outbreaks. This opinion introduces our branches in Latin American and Caribbean (LAC) countries and discusses AI governance in LAC in the light of biotechnology. Our network in LAC has high potential to help fight infectious diseases, particularly in low- and middle-income countries, generating opportunities for the widespread use of AI techniques to improve the health and well-being of their communities.


An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation

arXiv.org Artificial Intelligence

Retrieval-augmented generation integrates the capabilities of large language models with relevant information retrieved from an extensive corpus, yet encounters challenges when confronted with real-world noisy data. One recent solution is to train a filter module to find relevant content but only achieve suboptimal noise compression. In this paper, we propose to introduce the information bottleneck theory into retrieval-augmented generation. Our approach involves the filtration of noise by simultaneously maximizing the mutual information between compression and ground output, while minimizing the mutual information between compression and retrieved passage. In addition, we derive the formula of information bottleneck to facilitate its application in novel comprehensive evaluations, the selection of supervised fine-tuning data, and the construction of reinforcement learning rewards. Experimental results demonstrate that our approach achieves significant improvements across various question answering datasets, not only in terms of the correctness of answer generation but also in the conciseness with $2.5\%$ compression rate.


A Controversial Facial-Recognition Company Quietly Expands Into Latin America

TIME - Tech

For the past three months, a small encrypted group chat of Latin American officials who investigate online child-exploitation cases has been lighting up with reports of raids, arrests, and rescued minors in half a dozen countries. The successes are the result of a recent trial of a facial-recognition tool given to a group of Latin American law-enforcement officials, investigators, and prosecutors by the American company Clearview AI. During a five-day operation in Ecuador in early March, participants from 10 countries including Argentina, Brazil, Colombia, the Dominican Republic, El Salvador, and Peru were given access to Clearview's technology, which allows them to upload images and run them through a database of billions of public photos scraped from the Internet. "Normally it takes at least several days for a child to be identified, and sometimes there are victims that have not been identified for years," says Guillermo Galarza Abizaid, the vice president in charge of partnerships and law enforcement at the Virginia-based nonprofit International Centre for Missing and Exploited Children (ICMEC), which organized the event. The group used the facial-recognition tool to analyze a total of 2,198 images and 995 videos, hundreds of them from cold cases.


Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated impressive zero shot performance on a wide range of NLP tasks, demonstrating the ability to reason and apply commonsense. A relevant application is to use them for creating high quality synthetic datasets for downstream tasks. In this work, we probe whether GPT-4 can be used to augment existing extractive reading comprehension datasets. Automating data annotation processes has the potential to save large amounts of time, money and effort that goes into manually labelling datasets. In this paper, we evaluate the performance of GPT-4 as a replacement for human annotators for low resource reading comprehension tasks, by comparing performance after fine tuning, and the cost associated with annotation. This work serves to be the first analysis of LLMs as synthetic data augmenters for QA systems, highlighting the unique opportunities and challenges. Additionally, we release augmented versions of low resource datasets, that will allow the research community to create further benchmarks for evaluation of generated datasets.


Challenges and Applications of Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current state more quickly and become productive.


Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Journal of Artificial Intelligence Research

Evaluation practices in natural language generation (NLG) have many known flaws, but improved evaluation approaches are rarely widely adopted. This issue has become more urgent, since neural generation models have improved to the point where their outputs can often no longer be distinguished based on the surface-level features that older metrics rely on. This paper surveys the issues with human and automatic model evaluations and with commonly used datasets in NLG that have been pointed out over the past 20 years. We summarize, categorize, and discuss how researchers have been addressing these issues and what their findings mean for the current state of model evaluations. Building on those insights, we lay out a long-term vision for evaluation research and propose concrete steps for researchers to improve their evaluation processes. Finally, we analyze 66 generation papers from recent NLP conferences in how well they already follow these suggestions and identify which areas require more drastic changes to the status quo.


Enhancing Dialogue Summarization with Topic-Aware Global- and Local- Level Centrality

arXiv.org Artificial Intelligence

Dialogue summarization aims to condense a given dialogue into a simple and focused summary text. Typically, both the roles' viewpoints and conversational topics change in the dialogue stream. Thus how to effectively handle the shifting topics and select the most salient utterance becomes one of the major challenges of this task. In this paper, we propose a novel topic-aware Global-Local Centrality (GLC) model to help select the salient context from all sub-topics. The centralities are constructed at both the global and local levels. The global one aims to identify vital sub-topics in the dialogue and the local one aims to select the most important context in each sub-topic. Specifically, the GLC collects sub-topic based on the utterance representations. And each utterance is aligned with one sub-topic. Based on the sub-topics, the GLC calculates global- and local-level centralities. Finally, we combine the two to guide the model to capture both salient context and sub-topics when generating summaries. Experimental results show that our model outperforms strong baselines on three public dialogue summarization datasets: CSDS, MC, and SAMSUM. Further analysis demonstrates that our GLC can exactly identify vital contents from sub-topics.~\footnote{\url{https://github.com/xnliang98/bart-glc}}